CallSim: Evaluation of Base Calls Using Sequencing Simulation
نویسندگان
چکیده
Accurate base calls generated from sequencing data are required for downstream biological interpretation, particularly in the case of rare variants. CallSim is a software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data. The algorithm processes a single read using a Monte Carlo approach to sequencing simulation, not dependent upon information from any other read in the data set. Three examples from general read correction, as well as from error-or-variant classification, demonstrate its effectiveness for a robust low-volume read processing base corrector. Specifically, correction of errors in Ion Torrent reads from a study involving mutations in multidrug resistant Staphylococcus aureus illustrates an ability to classify an erroneous homopolymer call. In addition, support for a rare variant in 454 data for a mixed viral population demonstrates "base rescue" capabilities. CallSim provides evidence regarding the validity of base calls in sequences produced by 454 or Ion Torrent systems and is intended for hands-on downstream processing analysis. These downstream efforts, although time consuming, are necessary steps for accurate identification of rare variants.
منابع مشابه
Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data
New DNA sequencing technology implemented in the GS 20 sequencer reduces cost and time in exchange for lower accuracy. DNA sequencing errors negatively impact downstream applications and therefore accurate base calls and error probabilities are invaluable to researchers. This paper applies a graphical model to the base calling problem in context of the GS 20 sequencer. This model integrates sig...
متن کاملHigh-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.
A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ~0.1-1 × 10(-2) per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, "circle sequencing...
متن کاملLit Lunch – December 6th
A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, “circle sequencing,” w...
متن کاملUse of a neural network to predict normalized signal strengths from a DNA-sequencing microarray
A microarray DNA sequencing experiment for a molecule of N bases produces a 4xN data matrix, where for each of the N positions each quartet comprises the signal strength of binding of an experimental DNA to a reference oligonucleotide affixed to the microarray, for the four possible bases (A, C, G, or T). The strongest signal in each quartet should result from a perfect complementary match betw...
متن کاملCorrigendum: Comparative evaluation of DNase-seq footprint identification strategies
DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), correspondin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012